Detection of pancreatic cancer with two- and three-dimensional radiomic analysis in a nationwide population-based real-world dataset

Background CT is the major detection tool for pancreatic cancer (PC). However, approximately 40% of PCs < 2 cm are missed on CT, underscoring a pressing need for tools to supplement radiologist interpretation. Methods Contrast-enhanced CT studies of 546 patients with pancreatic adenocarcinoma diagnosed by histology/cytology between January 2005 and December 2019 and 733 CT studies of controls with normal pancreas obtained between the same period in a tertiary referral center were retrospectively collected for developing an automatic end-to-end computer-aided detection (CAD) tool for PC using two-dimensional (2D) and three-dimensional (3D) radiomic analysis with machine learning. The CAD tool was tested in a nationwide dataset comprising 1,477 CT studies (671 PCs, 806 controls) obtained from institutions throughout Taiwan. Results The CAD tool achieved 0.918 (95% CI, 0.895–0.938) sensitivity and 0.822 (95% CI, 0.794–0.848) specificity in differentiating between studies with and without PC (area under curve 0.947, 95% CI, 0.936–0.958), with 0.707 (95% CI, 0.602–0.797) sensitivity for tumors < 2 cm. The positive and negative likelihood ratios of PC were 5.17 (95% CI, 4.45–6.01) and 0.10 (95% CI, 0.08–0.13), respectively. Where high specificity is needed, using 2D and 3D analyses in series yielded 0.952 (95% CI, 0.934–0.965) specificity with a sensitivity of 0.742 (95% CI, 0.707–0.775), whereas using 2D and 3D analyses in parallel to maximize sensitivity yielded 0.915 (95% CI, 0.891–0.935) sensitivity at a specificity of 0.791 (95% CI, 0.762–0.819). Conclusions The high accuracy and robustness of the CAD tool supported its potential for enhancing the detection of PC. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-023-10536-8.


Background
Pancreatic cancer (PC), the most lethal cancer, is projected to become the second leading cause of cancer deaths in the US by 2030 with a 5-year survival rate of about only 11% [1,2]. Because patient survival rapidly diminishes with increasing tumor size [3], early detection is the most effective strategy to improve the grave prognosis [4]. Computed tomography (CT) is the major imaging modality for detecting PC, but early PCs are often obscure or even invisible to the naked eye on CT, with approximately 40% of PCs smaller than 2 cm being missed by radiologist interpretation [5]. Furthermore, the diagnostic performance of CT is interpreter-dependent and could be adversely affected by increasing radiologist workload [6]. Therefore, an effective tool that can assist radiologists in detecting PC is urgently needed and represents a major unmet clinical need.
Radiomics is a method which extracts quantitative information on density, shape, and texture from images for subsequent data mining [7]. Analysis of radiomic features with machine learning algorithms has shown great promise in medical image analysis [8]. In a recent proof of concept study, we identified distinguishing twodimensional (2D) radiomic features of PC on CT and showed that patch-based 2D radiomic analysis with a machine learning model could distinguish CT studies of PC patients and controls with 95% accuracy in a local test dataset [8]. However, in that study the pancreas and tumor were manually labeled by radiologists for subsequent radiomic analysis, limiting its applicability in clinical settings. While the adoption of a patchwise 2D analytic approach in that study enabled detailed finegrained assessment in each subregion of the pancreas, loss of information was inevitable compared with threedimensional (3D) analytic approaches [9]. The generalizability of the trained model to external datasets also needs further validation. To be clinically applicable, a computer-aided detection (CAD) tool must achieve segmentation (i.e., identifying the pancreas and tumor) and classification (PC vs non-PC) with minimal human labor and deliver robust performance in real clinical settings.
Therefore, this study investigated 2D and 3D radiomic analysis for detecting PC on CT, with automatic segmentation of the pancreas and tumor by deep learning (DL). An end-to-end CAD tool based on 2D and 3D radiomic analysis combined was further developed and tested with prospectively collected CT images from real clinical practice throughout Taiwan to ascertain its generalizability.

Methods
This retrospective study was conducted in accordance with the Declaration of Helsinki and approved by the Institute Research Ethical Committee of National Taiwan University Hospital (NTUH 201710050RINA, 201904116RINC), which waived the requirement for informed consent from individual patients.

Local dataset and manual image segmentation
Patients with histologically-or cytologically-confirmed pancreatic adenocarcinoma were identified from the Cancer Registry of National Taiwan University Hospital (NTUH), a tertiary referral center with a large volume of PC. CT images of those PC patients and subjects with normal or unremarkable pancreas according to the formal radiologist reports were extracted from the imaging archive of NTUH for further review to construct the local datasets. If an individual patient underwent multiple CT examinations, only the one that immediately preceded the diagnosis of PC was used. In total, contrast-enhanced portal venous CT images of 546 confirmed PC patients between January 1, 2005, and December 31, 2019, and 1,466 subjects who underwent CT during the same period with a negative or unremarkable pancreas in the radiologist report were included in the local dataset. 733 controls were randomly selected from the local dataset to construct the nationwide test set (see below), whereas all cases and the remaining controls were randomly divided into a local training set (437 PCs, 586 controls) and a validation set (109 PCs, 147 controls) (Fig. 1).
CT examinations included in this study were performed with one of 47 scanners from six different manufacturers (GE Healthcare, Siemens Healthcare, Philips Healthcare, Toshiba, Picker International, and Hitachi Medical Corporation) with 100, 120, or 130 kV, automatic mA control, and without noise reduction. All CT images were obtained in the portal venous phase with intravenous administration of contrast medium (1.5 mL per kg body weight, with an upper limit of 150 mL). Each slice had a size of 512 × 512 pixels, with a thickness ranging between 0.7 mm and 1.5 mm that was reconstructed into 5-mm for subsequence analysis. The pancreas and tumor on the CT images of PC patients were manually segmented and labeled as regions of interest (ROIs) for model training by one of two experienced abdominal radiologists (P-TC and K-LL) who had over 8 years and 23 years of experience, respectively, with reference to the results of other examinations and surgical findings when needed.

Population-based test set
Taiwan's National Health Insurance (NHI) is a singlepayer compulsory health insurance program that covers inpatient and outpatient care for 99.8% of the population and is contracted with 92.7% (21,463/23,164) of the institutions in Taiwan [10]. Up to July 2021, around 2.9 million imaging studies performed in daily clinical practice were uploaded by institutions throughout Taiwan to the Applications of Artificial Intelligence in Medical Images Database of NHI. Patients diagnosed with PC throughout Taiwan were identified by searching for recipients of the Severe Illness Certificate issued for the International Classification of Diseases-10th Revision-Clinical Modification (ICD-10-CM) [11] code of C25 (malignant neoplasm of the pancreas) and its sub-items in the NHI Major Illness/Injury Certificate database, and the Applications of Artificial Intelligence in Medical Images Database was searched to retrieve CT studies of those patients with PC. If multiple CT studies were identified for an individual patient, only the study closest to the date of certificate issuance was retrieved, yielding CT studies of 671 cases with newly confirmed PC diagnosis between January 1, 2018, and July 31, 2019, throughout Taiwan. For controls with normal pancreas, CT studies performed for pre-donation evaluation of all kidney donors and liver donors during the same period were retrieved from the Applications of Artificial Intelligence in Medical Images Database and reviewed by a radiologist (P-TC) to confirm the absence of radiological abnormalities in the pancreas, yielding 73 control subjects. To balance the number of cases and controls, the 733 controls randomly selected from the local dataset were combined with the controls from the NHI dataset to form the nationwide test set (671 PCs, 806 controls) (Fig. 1).

Two-and three-dimensional radiomic analyses
The radiomics-based classification module consisted of three components: a 3D radiomics model (3D analysis), a 2D patch-based radiomics model (2D analysis), and a logistic regression model using the outputs of the former two radiomics models as independent variables (combined analysis) to predict the probability of PC. The 3D analysis examined the whole pancreas by 3D radiomic features, whereas in the 2D analysis the pancreas on each 2D slice was cropped into patches for subsequent extraction of 2D radiomic features.
For 3D analysis, the pancreas including the tumor (if present) was segmented using a previously trained automatic deep learning segmentation model [12]. This segmentation model was trained with CT images of 437 PC patients from the NTUH training and validation datasets and those of 393 subjects from three external datasets based on a model from coarse-to-fine network architecture search (C2FNAS) [13] (Additional file 1). For model training in the 2D analysis, segmentation of the pancreas and tumor was performed manually by radiologists in the images of PC patients and by the automatic segmentation model in the images of controls. During model testing, the automatic segmentation model was used for segmenting the pancreas and tumor in both 2D and 3D analyses.
All radiomic features in this study were extracted using an open-source platform (PyRadiomics) [14]. To eliminate bias resulting from the differences in spacing when extracting the radiomic features, all the images and segmentation labels were resampled to the spacing of 1 × 1 × 5 mm using linear interpolation and nearestneighbor interpolation, respectively. The bin width for computing texture features was fixed at 16.

Training of classification models based on three-dimensional radiomic features
For 3D analysis, a 3D radiomics model was trained from the NTUH training set. The union of the pancreas and tumor, segmented by the automatic segmentation model, in the 3D CT volume served as the volume of interest (VOI) for subsequent extraction of 3D radiomic features. A total of 1183 features, including first-order and higherorder texture features on original and filtered images, were extracted from the VOI (Additional file 1). To differentiate whether a pancreas included a tumor based on 3D radiomic features, the 1183 features of all data from the training set were inputted into XGBoost, a widely used machine learning algorithm based on gradient boosting decision tree [15], to train a classification model. To mitigate overfitting, the training process was terminated when the area under the receiver operating characteristic curve (AUC) on the validation set was not increased for 30 iterations, and the model in the training process that had the highest AUC on the validation set was selected as the final model for 3D analysis. The loss function of this XGBoost model was set as logistic loss and the resultant probability served as the output of the 3D analysis.

Training of classification models based on two-dimensional patch-based radiomic features
In 2D analysis, every patch generated from the ROI (i.e. pancreas and tumor) in a CT study was analyzed by a 2D patch-based radiomics model to predict the probability of cancer in each patch, and the patient was predicted as with or without PC by jointly considering the predicted probabilities of all patches of the patient. The union of pancreas and tumor on the images was set as the ROI. After resampling, the images were cropped into 20 × 20 pixel square subregions (i.e. patches) on the axial (x-y) plane, using a moving window with a stride of five pixels. The patches which had more than 5% of the area overlapping with ROI were treated as valid patches and included in the analysis. The patches containing any portion of the tumor were labeled as cancerous patches, whereas the patches containing no tumor were labeled as non-cancerous. A total of 545 features were extracted from the ROI in each patch (Additional file 1). All the features of patches in the training set were then input into XGBoost with the logistic loss function for distinguishing cancerous patches from non-cancerous patches.
To determine whether a patient had a tumor, a heatmap was generated for each patient by aggregating the prediction from the 2D patch-based radiomic model. More specifically, all valid patches extracted from the ROI of a specific patient were input into the trained XGBoost model to obtain the probabilities of having a tumor, and then the prediction results of all patches were assembled into a heatmap. The value of each pixel in the heatmap was defined as the average of the predicted probabilities of the patches that contained this pixel. Then the area of the high-risk region of a CT study was defined as the area of the largest region formed by contiguously neighboring high-risk pixels among all the axial planes, with the threshold for classifying a pixel as high-risk set at the value corresponding to the highest AUC in differentiating between PCs and controls in the validation set, searched from 0.05 to 0.95 with a step of 0.01. The area of the highrisk region was used as the output of the 2D analysis.

Development of CAD tool based on automatic segmentation and two-and three-dimensional radiomic analysis combined
An automatic CAD tool for PC comprising a DL-based segmentation module and a radiomics-based classification module was developed (Fig. 2). The pancreas and tumor (if present) on CT images were first automatically segmented by the previously trained DL segmentation model [12] and then subject to the extraction of 3D and 2D radiomic features, which were subsequently analyzed by the trained 3D and 2D radiomic analysis models, respectively. The resultant probability of PC generated from the 3D analysis and the area of the high-risk region from the 2D analysis were then input into a logistic regression model trained with the validation set to yield the final prediction regarding whether the CT study harbored PC (combined analysis).

Model testing and statistical analysis
The performance of the automatic CAD workflow with 3D analysis, 2D analysis, and combined analysis was tested with the nationwide test set. The automatic segmentation was used as both VOI in 3D analysis and ROI in 2D analysis. For differentiating whether a patient had PC for each analysis, the cutoff was selected as the threshold that corresponded to the point with the highest Youden index (i.e. sensitivity + specificity-1) on the ROC curve plotted by the outputs of the corresponding analysis from all patients in the validation set. Model performance was assessed with ROC curves and associated AUCs. Sensitivity, specificity, and accuracy were ascertained with the respective exact confidence intervals (CI) calculated based on binomial distributions. For comparison between groups, Fisher exact test and Mann-Whitney U test were used for categorical variables and continuous variables, respectively. Comparison of AUCs between various analyses was conducted using the pROC package in R [16,17] with paired Delong's method.

Training and testing of three-dimensional radiomics model
The clinical characteristics of the subjects in the local dataset are summarized in  Fig. 3B).
Among the 1183 radiomic features, the p-values of 236 (19.9%) features were smaller than 0.001. The top 10 differential features (i.e., features with the highest gain values) according to the 3D radiomics XGBoost model are summarized in Table 3. The top three differential features were GLCM: correlation (wavelet-LLL), GLCM: IMC2 (original), and GLCM: IMC2 (wavelet-LLL), all of which are generally categorized as texture features as they measure the correlation between the gray level of a voxel and the gray levels of its surrounding voxels in the VOI. Values of these 3 texture features were consistently higher in PCs compared with those in controls in both the local dataset and the nationwide dataset (Fig. 4).     Table 4. The topranking feature of NGTDM: business (original) is a texture feature that measures the heterogeneity inside the ROI, and its values were higher in cancerous patches compared with noncancerous patches in the local dataset (Fig. 6), indicating that PCs were more heterogeneous with a coarser texture. The second-ranked feature First order: median (original) measures the median intensity inside the ROI, and its values were lower Table 2 Performance of 3D analysis, 2D analysis, and combined analysis in various sets a One control was excluded due to the failure of automatic segmentation   Fig. 3B).

CAD tool for pancreatic cancer detection based on twoand three-dimensional radiomic analysis combined
The trained logistic regression model for generating the final prediction of the probability of PC in the CAD tool was logit(p)  Fig. 3A).

Sensitivity according to tumor size
The sensitivity for PC stratified by tumor size is summarized in Table 5. A positive association between sensitivity and tumor size was noted with either 2D or 3D analysis and combined analysis (all Ptrend < 0.001). For tumors < 2 cm, the CAD tool achieved significantly higher sensitivity

Likelihood ratios of pancreatic cancer
In the nationwide test set, the positive likelihood ratios (+ LR) and negative likelihood ratios (-LR) of the CAD tool were 5.17 (4.45-6.01) and 0.10 (0.08-0.13), respectively (Table 2). When both 3D and 2D analyses predicted a study as positive, the + LR of PC increased to 15.32 (11.24-20.87). When both 3D and 2D analyses predicted a study as negative, the -LR of PC was 0.11 (0.08-0.14) ( Table 6). For clinical settings where high specificity is needed, using 2D and 3D analyses in series (i.e., the study predicted as positive if both tests were positive) yielded 0.952 (0.934-0.965) specificity with a sensitivity of 0.742 (0.707-0.775). When high sensitivity is needed, using 2D and 3D analyses in parallel (i.e., the study predicted as positive if either test was positive) yielded 0.915 (0.891-0.935) sensitivity at a specificity of 0.791 (0.762-0.819) ( Table 2).

Discussion
This study combined a segmentation DL model with 2D and 3D radiomic analysis for detecting PC on contrastenhanced CT images. We further developed an automatic end-to-end CAD tool which combined 2D and 3D radiomic analysis and did not require manual image preprocessing and labeling/segmentation. In a test set comprising real-world data prospectively collected from institutions across Taiwan, the CAD tool achieved 0.918 sensitivity and 0.822 specificity, confirming its robustness and generalizability. We have previously identified novel distinguishing CT radiomic features of PC and showed that patch-based 2D radiomic analysis with a machine learning model accurately detected PC in an independent local dataset [18].  Another previous study also employed 3D radiomic analysis of the pancreas for distinguishing PC on CT [19] and achieved 0.992 accuracy in a test dataset from the same institution. However, in that study, the generalizability of the trained model was not assessed with external images and the VOIs (ie., pancreas and tumor) were manually labeled by radiologists. By contrast, in this study the ROIs and VOIs were automatically segmented by our DL segmentation model for subsequent radiomic analysis, and accurate detection of PC was achieved through the combination of patch-wise 2D and volumetric 3D radiomic analysis. In a real-world dataset prospectively collected from institutions throughout Taiwan, our endto-end CAD tool achieved 0.918 sensitivity (0.707 for PCs < 2 cm) and 0.822 specificity, providing strong support for its robustness and generalizability. This study is the first to conduct an in-depth comparative analysis between adopting 2D versus 3D analytic approach for radiomic analysis of the pancreas. The findings of this study demonstrated that while 3D radiomic analysis outperformed 2D radiomic analysis for detecting PC, combining both analyses further improved the performance compared with 3D analysis alone. The major advantage of the 2D analysis was that the pancreas was cropped into overlapping patches that are subject to radiomic analysis individually. Therefore, each fine-grained subregion was subject to multiple rounds of radiomic analysis, each round analyzed with different neighboring subregions, and thereby might increase the sensitivity for detection [20]. However, the 2D analytic approach could not account for the correlations between foci that are adjacent to each other in 3-D space but separated into different 2D slices. By contrast, 3D radiomic analysis could capture and take into consideration the correlations between neighboring foci in differentiating between cancerous and noncancerous pancreas. Notably, combining 2D and 3D analysis significantly improved the sensitivity for PC < 2 cm compared with either analysis alone (combined: 0.707, 2D: 0.467, 3D: 0.522), and previous research showed that approximately 40% of PCs smaller than 2 cm were missed on CT by human interpretation [5].
This study provides novel insights into the differential radiomic characteristics of PC. While the identified differential 3D radiomic features are either high order features or derived from filtered images and thus difficult to interpret, the results of 3D radiomic analysis indicated that differences in texture most distinctly differentiate between PC and noncancerous pancreas. In contrast, the finding of increased NGTDM: busyness (original), a measure of heterogeneity, and decreased First order: median (original), a measure of intensity, as the major differential 2D radiomic features of PC corresponded with the typical manifestations of PCs as heterogeneous hypodense masses on CT [21,22] and abrogated our previous study which first reported these two features as key differential features of PC [18]. Further research to explore the potential correlations between these features and clinicopathological characteristics or treatment response is warranted.
Besides making a binary prediction (PC vs non-PC), the CAD tool could further provide LRs to better assist clinicians in determining the subsequent diagnostic-therapeutic process. Endoscopic ultrasound-guided tissue sampling enables preoperative differentiation between PC and various noncancerous mimickers to avoid unnecessary surgery but carries a risk of tumor dissemination and false negativity [23]. The general consensus is that patients with resectable pancreatic masses can undergo surgery without preoperative tissue sampling if PC is highly favored based on imaging and clinical grounds, and the need for tissue sampling should be carefully weighed considering the likelihood of PC vs alternative diagnoses, surgical candidacy, and risk of biopsy-related tumor dissemination [24,25]. LR is a quantitative measure of the confidence of the binary prediction and can be multiplied with the pre-test odds determined based on clinical grounds and clinical experience to derive the post-test odds and probability of PC. In addition to the LRs based on the logistic regression model combining both 2D and 3D analyses, the CAD tool can also provide the LR based on the individual results of 2D and 3D analyses, thereby better informing the clinicians in choosing between direct surgery and tissue sampling. When maximal sensitivity or specificity is needed given the clinical scenario, the final prediction of the CAD tool could also be based on using 2D and 3D analysis in parallel or in series. Moreover, the pancreatic lesion identified by the segmentation model can serve to indicate the possible location of the tumor for further review by radiologists.
This study had several strengths. By integrating a DL segmentation model trained with images from multiple institutions and races/ethnicities [12] and radiomic analysis with machine learning, the resultant CAD tool enables automatic end-to-end analysis without requiring manual image annotation/processing. The novel approach of combining 2D and 3D radiomic analysis which respectively provides patch-wise and global interrogation of the pancreas and are thus complementary yielded better performance compared with either analysis alone. Furthermore, the prospectively collected realworld population-based test set included variations in imaging equipment/parameter and quality inherent in real clinical practice and thus was the most rigorous test set ever used in studies on the usefulness of CT radiomics in detecting PC. The ability to achieve high accuracy in such a test set provided strong support for the robustness and generalizability of the CAD tool. Given that the diagnostic performance of radiologists is adversely affected by overloading and disparities in expertise/experience in the actual clinical environment, this tool holds potential for supplementing radiologists to reduce miss rate and enhance the early detection of PC.
This study also had limitations. Radiologist reports were not available from the NHI dataset; therefore, we could not compare the performance of the CAD tool with that of radiologist interpretation. Secondly, the Taiwanese population from which the nationwide test set was derived is predominantly Asian. While this study attested to the generalizability of the CAD tool in the Taiwanese and perhaps Asian populations, the potential generalizability to other races and ethnicities requires further evaluation. Thirdly, this study focused on ascertaining the robustness of radiomics in differentiating between cases with PC and controls with normal pancreas in real-world multi-institutional settings. Whether radiomics can differentiate between PC and non-PC pancreatic diseases needs to be investigated in future research.

Conclusions
In conclusion, this study developed an end-to-end CAD tool for PC based on 2D and 3D radiomic analysis with machine learning. The CAD tool accurately and robustly detected PC on contrast-enhanced CT images and thus could be used to enhance the detection of PC.